discriminative question
A General Framework for Producing Interpretable Semantic Text Embeddings
Sun, Yiqun, Huang, Qiang, Tang, Yixuan, Tung, Anthony K. H., Yu, Jun
Semantic text embedding is essential to many tasks in Natural Language Processing (NLP). While black-box models are capable of generating high-quality embeddings, their lack of interpretability limits their use in tasks that demand transparency. Recent approaches have improved interpretability by leveraging domain-expert-crafted or LLM-generated questions, but these methods rely heavily on expert input or well-prompt design, which restricts their generalizability and ability to generate discriminative questions across a wide range of tasks. To address these challenges, we introduce CQG-MBQA (Contrastive Question Generation - Multi-task Binary Question Answering), a general framework for producing interpretable semantic text embeddings across diverse tasks. Our framework systematically generates highly discriminative, low cognitive load yes/no questions through the CQG method and answers them efficiently with the MBQA model, resulting in interpretable embeddings in a cost-effective manner. We validate the effectiveness and interpretability of CQG-MBQA through extensive experiments and ablation studies, demonstrating that it delivers embedding quality comparable to many advanced black-box models while maintaining inherently interpretability. Additionally, CQG-MBQA outperforms other interpretable text embedding methods across various downstream tasks. Text embedding is a cornerstone of Natural Language Processing (NLP), transforming texts--whether sentences, paragraphs, or full documents--into embedding vectors that capture their semantic meaning. In semantic embedding spaces, the similarity between texts is represented by the proximity of their embedding vectors, typically measured using distance measures like Euclidean distance, cosine distance, or inner product. Black-box text embedding methods, such as Sentence-BERT (Reimers & Gurevych, 2019), SimCSE (Gao et al., 2021), WhitenedCSE (Zhuo et al., 2023), and AnglE (Li & Li, 2024), excel at generating high-quality embeddings by training on vast amounts of data. These models are highly effective at capturing semantic similarities, making them indispensable for a variety of NLP tasks (Muennighoff et al., 2023). However, their black-box nature leaves the embeddings opaque to human users.
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Kuan, Chun-Yi, Huang, Wei-Ping, Lee, Hung-yi
Large audio-language models (LALMs) enhance traditional large language models by integrating audio perception capabilities, allowing them to tackle audio-related tasks. Previous research has primarily focused on assessing the performance of LALMs across various tasks, yet overlooking their reliability, particularly concerning issues like object hallucination. In our study, we introduce methods to assess the extent of object hallucination of publicly available LALMs. Our findings reveal that LALMs are comparable to specialized audio captioning models in their understanding of audio content, but struggle to answer discriminative questions, specifically those requiring the identification of the presence of particular object sounds within an audio clip. This limitation highlights a critical weakness in current LALMs: their inadequate understanding of discriminative queries. Moreover, we explore the potential of prompt engineering to enhance LALMs' performance on discriminative questions.
Resolving Intent Ambiguities by Retrieving Discriminative Clarifying Questions
Task oriented Dialogue Systems generally employ intent detection systems in order to map user queries to a set of pre-defined intents. However, user queries appearing in natural language can be easily ambiguous and hence such a direct mapping might not be straightforward harming intent detection and eventually the overall performance of a dialogue system. Moreover, acquiring domain-specific clarification questions is costly. In order to disambiguate queries which are ambiguous between two intents, we propose a novel method of generating discriminative questions using a simple rule based system which can take advantage of any question generation system without requiring annotated data of clarification questions. Our approach aims at discrimination between two intents but can be easily extended to clarification over multiple intents. Seeking clarification from the user to classify user intents not only helps understand the user intent effectively, but also reduces the roboticity of the conversation and makes the interaction considerably natural.